Abstract 

This essay describes a research program at the intersection of physical theory and com- 
putability. A new self-reference problem is posed which it is argued shows that for any actually 
testable physical theory there necessarily exists a physical system that cannot be tested, namely 
the hardware system that implements a computation of the theory's predictions. The argument 
proceeds in two stages, the first is to present a universal model for a physical implementation 
of a computation. In the second step we create a chain of models each of which describes the 
dynamics of the hardware underlying the previous model. It is argued that the solution to the 
Physical Self- Reference Problem is the limit of this sequence, but that the error at each step 
grows exponentially with the step index. The result is that finite iterations do not converge 
to solution but rather diverge from it. 

1 Introduction 

My research has been focused on the intersection of computability theory and physical theory, 
and whether it is possible to have a 'Theory of Everything' (TOE) to which all questions can 
in principle be referred. Computability theory would suggest that the answer to this question is 
that no such theory is possible. However the dominant philosophical perspective in the physical 
sciences, particularly Physics and Neuroscience, seems to be the diametrically opposite perspective, 
that computability theory has nothing to say in these subject areas, and that ultimately a theory 
will exist that will be capable of anwering any question in these areas. In an attempt to address the 
arguments used by physical scientists, I have constructed what I call the Physical Self-Reference 
Problem. 

The Physical Self-Reference Problem (PSRP) asks if it is possible to have a computation, or 
analytic expression, that describes the dynamics of the physical system that performed that com- 
putation, or generated the analytic expression. The key point being that in order to test a theory 
one needs to perform a computation in order to generate data to test. If the dynamics of that 
computation process necessarily exceed our modeling ability then there necessarily exists at least 
one physical system which we cannot model, and hence no 'Theory of Everything' is possible. 

My work has focused primarily on the Turing computability case. Of the many other choices for 
what a calculation might look like, the analytic case seems to be the major alternative in actual use. 
The analytic case however has elements that have no clear physical implementation, in particular 
the taking of limits. The proof route that I have been pursuing consists of two main steps. The first 
is to establish a universal description of what a computation is, and more importantly a generic 
description of all possible dynamical implementations of a computation. The second step, which 
is the PSRP proper, consists of showing that the PSRP leads to an infinite chain of dynamical 
systems, each of greater complexity than the last. 



2 Step One: The Building Blocks 

In this section we describe the major entities needed to construct the Physical Self-Reference Prob- 
lem. The main components are a computational model, and a hopefully universal description of 
any physical implementation of that computational model. 

2.1 Computational Machinery 

In order to facilitate later steps it is necessary to modify the standard Turing Machine (TM) in two 
ways. The first is to make it into a Mealy Machine. This is done in two parts. The first is to require 
transition outputs to be labelled by both a new symbol, and a 'move' where a move of is allowed. 
In other words transitions are labelled by (a:(a',k)) where a, a' G E are elements of the alphabet 
of symbols and ke Z indicates the next cell to act on. These modifications are a superset of the 
standard TM model and thus have equal computational power. Conceptually instead of thinking 
of a boxcar on rails we are thinking of a collection of variables connected by a switching network 
to a collection of operators {Mj\ Mj : E — >• E} where j G J op C N ( J op finite), (see figure 1 p. 2) 
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Figure 1: Three views of the Turing Machine as a switched network with a Mealy Machine state graph 

The two outputs of a transition indicate 1) the operator to employ on the current variable, and 2) 
the next variable to act on. The second step consists of requiring that all nodes of the state machine 
have outgoing labels for every possible a G E. This change too has no effect on functionality and 
has the effect of making the modified TM into a well defined discrete time dynamical system, given 
by the following equation: 
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where we are showing the description for a single step from time ti to time U+i. The complete 
description is then a product of matrices of the above form where the index i runs from 0, the start 
time, to H G N the halt time. The subscript p(k,v(t),ti) : Z x v(t) xN 4 J op indicates the actual 
path traversed in the graph of the TM and provides the specific endomorphism M p that is applied 
to variable k (vk) at time £j. The v(t) dependence is shorthand indicating that the path depends on 
the prior data, as a result generically the TM is a non-linear system. We observe that the matrix 
is diagonal in the basis we are using. Notice that for any single time step all but one of the M p 's is 
an identity operator. We thus recover a result from the work of Blum, Shub and Smale on the BSS 
Machine[l], that the input/output map of any computation is a polynomial over each input/output 
pair. In fact in this case it is a monomial of operators acting on the initial variable value or input, 
and it follows that in this basis the equation representing the entire computation, like that for a 
single time step, is diagonal. 

It turns out that there is a class of groups called 'Self-Similar Groups' [3] which are groups 
generated by Mealy Machines. It follows that with the above computational model, any computable 
function generates a self-similar group. The monograph Self-Similar Groups [3] provides an algebraic 
formalism (extending that in Eilenberg[2]) for these groups in terms of the state graph of the Mealy 
Machine. In particular the states Q are a bi-module over G s where the group G s can naturally be 
interpreted as a subgroup of the automorphism group Aut(E*) of the rooted tree E* extendable to 
Aut(E w ). The states Q can be identified with the direct product E x G s of the symbol alphabet 
and the group. The left action is given by h ■ (a, g) = (h(a), h\ a g) where h,g G G s , and a G E. Here 
h(a) = a' which is the action of some endomorphism Mj specified by the transition between two 
nodes of the state graph. Similarly h\ c is the next state function, denoted schematically by the 
line that connects two states in the state graph. The right action of the bi-module Q is given by 
(a, g) ■ h = (a, gh). In other words, given a word w G E* (or E w ), we have g ■ a = a' ■ h if and only if 
g(<jw) = a'h(w), and g\ a = h [3]. This means that the output letter, and the next state are encoded 
by the action of the input symbol on the bi-module Q. The module Q can, as we will see in the next 
section, be identified with the orbits of the physical system giving rise to the computation under 
the action of the time evolution operator. 

2.2 Dynamical Machinery 

In this section we describe the dynamical system associated to the computational system discussed in 
the previous section. Ideally there would be some universal prescription for converting an algebraic 
specification into a dynamical system, unfortunately while there is a minimal dynamical system we 
can naturally associate to such a description [4], there does not appear to be a natural way to define 
a stable dynamical system. Therefore we have to go through a tediously large range of cases, which 
we will only summarize in brief here. The overall goal here is to develop a minimal specification 
for a dynamical system in terms of number of zeros and in terms of number of functions needed 



(minimally, in principle) to implement the computation. 

To place the following discussion we first present a big-picture view of what we are going to do. 
We are going to first define the topology of a single variable, in as much generality as possible, and 
then define the other components we need, connectors and operators (see discussion of hardware 
implementation above) in terms of the topology of the variables. From these components we can 
thus build up a dynamical system that implements a particular computation. 

2.2.1 Variables 

The primary classification of variables from the point of view of the PSRP is into the categories of 
monolithic and composite coded variables. The key issue that divides them is how many physical 
functions are needed to build up one symbol of the alphabet. In monolithic coding only a single 
function is used (in principle), while in composite coding there exists a sub-symbol alphabet with 
each sub-symbol implemented minimally with a single function. Thus for composite codes Nf > 1, 
while for monolithic coding Nf = 1, where Nf is the number of functions needed to implement a 
variable. Within each of these categories we then have a number of sub-categories: space coded, 
function coded, and mixed space/function coded. The case of function coded also has within it a 
large set of possibilities: amplitude coding, frequency coding, phase coding, or as for instance in 
the case of the brain something even more complicated. The following illustration shows the major 
categories of hardware implementations of variables, (see figure 2 p. 4) 
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Figure 2: Monolithic vs. composite coding schemes. Note that all of the composite codes have multiple actual 
functions per letter while all of the monolithic codes have only one. The general classes of code types are represented 
in both the monolithic and composite cases, but we show only amplitude coding for the function coding case 



It is possible to describe all of these case in a single mathematical description. A dynamical 
system "V which implements a variable V contains a collection {V a } of open sets V a C (U, J-) in the 
product space li x T C V, where U = {Ui C 1R 3 | Ui simply connected, open, Ui n Uj = 0, and 
Uigj Ui C R 3 }, (i 6 la finite index set |7| > 1), is an open set in physical space, and T = \_\ ie iJ~i, 
is the disjoint union of the rings of continuous functions, J-i, on each of the components Ui of li. 
Intuitively what we are saying overall is that there are a collection of open sets in space and on 
those open sets functions sufficiently close to an attractor map to a specific variable value a, or 
more rigorously, there exists a map ( S: (VI, J 7 ) — > S from a variable to the symbol alphabet, such 
that (x,f(x)) G V a •&■ &(x,f(x)) = a. & is a union of constant maps whose domain is Dom(Sf) = 

u CT€S V CT c (U,T). 

For monolithic coding the number of attractors Na equals the size of the alphabet Na = N-%. 
For a composite code, let the subsymbol alphabet be T, and the number of active functions be 

N A = \T\Nf 



Nf, then the size of the alphabet N-% = \T\ N f. The number of attractors for a composite code is 
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2.2.2 Connectors and Operators 

The basic requirement that we have for a connector is that it preserve the mapping of the variable 
- that is that it take an equivalence class of functions at the start of the connector to the same 
equivalence class at the end of the connector. Clearly the easiest, but not the only, way to do 
this is to preserve the equivalence class throughout the connector. Furthermore we assume that it 
occupies some physical space and thus consists of a collection of connected open sets. The simplest 
case is where the number of components of a connector is equal to the number of components of a 
variable. 

The operators too are defined on a number of connected components equal to the number 
of components of the variables. Any valid operator x m the dynamical system Y has to have 
the property that it takes a letter a to some other letter a', so Xa.a' '■ (M, F) — > (M, F) such 
that V a ' ^ Xcr,a'Va- In words, the map x c ,o' has to be contracting on equivalence classes. For 
any symbol transformation there is an entire equivalence class of physical operators effecting the 
desired transformation, and any actual operator is one member of this equivalence class. The 
transformations associated with each of the variable types is shown below (figure 3 p. 6). The most 
generic statement about the form of the operators is that they are vector fields on braids in 3-space. 
In the case of composite coding a secondary coupling exists in order to move information from one 
sub-symbol to another. 

At this point we have developed a formalism for describing the hardware underlying a Turing 
Machine. It is clear that this formalism implies the existence of some differential equation which 
can describe the time evolution of the system in question. However, in order to write down an 
actual equation we would have to choose a specific form for the variables, and their coding, which 
would limit the universality of the equation and the ensuing discussion. However we can give a 
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Figure 3: Illustration of the transformation effected by an operator for each coding scheme 



graphical representation of it. (see figure 4 p. 6) 



V-,.,1 I V.,,. I I V... I I V, » I I Vz 

At— ■-*—■ . "if . . jr . j—sk 

I ! I i i 1 I ! 

J£_ —it— —it— it— —it 



*pj l_v^ |_Vpj l_v_J |* 
JK-. r-*— i . "if , i— *-i i— » 

i 1 i ! 1 ! I i 

J£_ it— it— it— it 



V-z.2 | | V-w | | V..; | | V,,, | | V2, 

Jr— . . v , , •if . . •if . _* 

I 1 I 1 i ! I 1 






Figure 4: Turing Machine in implicit time form. In this picture each of the square boxes represents a member of an 
equivalence class of endomorphisms of the physical substrate of a variable. Black boxes are identity endomorphisms, 
the other colors represent non-identity endomorphisms. 

The main point is that this is not a symmetric space - different locations in the space are different 
either in terms of the topology (pure space), vector fields on the topology (pure function), or both 
(mixed). So the equation is in fact a partial differential equation. This seems to imply that all of the 
foregoing discussion of actions of discrete groups on this space is best understood in the language 
of orbifolds, and Lie groupoids. We see also that generically the associated differential equation 
is non-linear - the operations applied depend on the initial data. What stands out clearly in the 
above picture is the presence of the group G s - we can see how the state machine is expressed in the 
differential equation, and embodies the essential dynamics - and we recall that G s incorporates the 
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non-linear data dependence of the equation. It turns out that the word structure of the boundary 
of the rooted tree S w generated by the group G s is naturally a groupoid[6]. This appears to be a 
different groupoid than the groupoid of the operators of the differential equation, though they may 
be related. 

3 Step Two: The PSRP Recursion 

Having developed a description of both the behavior and implementation of a generic computation, 
we now consider the Physical Self Reference Problem itself. Throughout this section in the interests 
of brevity we describe only the major points with a minimum of motivation and details. 

3.1 Seed Problem 

The PSRP requires a seed problem. Any non-linear problem will do, so we focus on a non-linear 
problem having 2 zeros. We have in mind a physical system that gives rise to behavior similar to 
that of a simple binary symbol, and we call this system T> v for "Dynamical system of a Variable" . 
We define two equivalence classes of function values Vq >t , and Vi jT by nearness to one or the other 
of the zeros at some specific time r: Vi yT = {( V C R 3 , Vi = (4, Ui) C R)| y £ Vi =>- \y — Z^\ < e} 
for some e G R, i=l,2, where y = f(x,r) and f(x, t) 6 H (T> V ,'R) (the set V is the same in both 
of the Vj )T ). By making the size of the two open sets Vj, r very much larger than the size of the 
complement of their union, ||Vo, T U Vi )T || ^> ||Vq t fl Vi r ||, then the system will spend most of its 
time in one or the other of the V; jT and the syntactic description of the variable in figure 5 (p. 7) 
will be coarse, but correct to a high degree of approximation. As a result of these manipulations 
we can associate a graph V ps with the dynamical system T> v : 




Figure 5: States of a binary variable. On the left we show the dynamical system T> v , including the neighborhoods 
of the zeros used to define the syntactic structure. On the right we have the resulting discrete model 

Evidently we can ask for a model IV^ of the system V v . Hence we have the following picture 
relating the three objects we have so far: 
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Figure 6: Initial segment of the PSRP model chain 

The map Sf„ : T> v — > V ps is precisely the map we had in mind in defining the Vi )T . It assigns 
an equivalence class of function values Vj )T (i=l,2) to a function value at some specified time r: 
&v(f,r) = V vM & |/(x,r)-Z,| < e. 

The map e:(N, E, E, E, E) — >■ (X\,,R,R, WL, M.) takes a variable index iE N and a quadruple of 
variable values <Jo,<7i,<72,0"3 G E to a point p { G X^ and a quadruple («o, «i, W2, U3) G R 4 , and we 
know that a family of first order function germs whose representative is given by /(p G Vj j(T0j(7l)<T2)(T3 ) = 
no + U]£ + M2W + ^3-2 defined on an open set V ij(70tCTlta2j(73 = ( U~i, (u — I, w + I), («i — /, Ui + /), (u 2 — 
/, « 2 + 0, («3 - I, «3 + 0) surrounding (ft, m , «i, W2, W3), where e(i, a , <7i, <7 2 , 03) = (ft, «o, «i, «2, u 3 ), 
and / = T^r- I n other words e = (J^) _1 is the inverse of a map j^" (figure 7, p. 9) from a set of 
families Ty t g of germs of functions on V v , to the symbols in M lsp indexed by i. (see figure 

7 p. 9) 

In plainer language the values of a quadruple of variables tells us the value of the function and 
its first derivatives with some uncertainty determined by the size of the symbol alphabet and the 
number of symbols used to encode the complete function space of the manifold. In view of the 
inverse relationship between the map e and the map & we will from now on refer to the map e 
simply as J£~ _1 . 

The map p : M lps — > V ps from the model to the graph is a simple projection. In brief for 
variables of Mi that code for function values (not derivatives), when the value associated with the 
variable is in one of the sets Vi yT then p maps that syntactic 'value' (say a) to the appropriate Vj iT . 
In other words p(vk, cr) = V« <=> \f(q) — Zj\ < e where f(q) = J£~ _1 (£;, a, E, E, E) is the value of the 
function at the point q given by the map J£~ _1 . It follows that the set of maps shown in the diagram 
(figure 6 p. 8) form a commutative diagram, at least on the restriction of J£~ to Dom(ft) the domain 
of p. That is% = po &\ DomiyV) . 

3.2 The PSRP Recursion 

Let us now consider the model IVb independently of the variable it is a model of. Evidently if IV^ 
is to exist it must also be a physical system, and from the foregoing, we can look for a minimal 



uo+ro - 

uo - 

uo-ro - 




vi+si 
vo+so vi-si 

vo-so 



Manifold 




- ui+ri 

- ui 

- ui+ri 



aO 


CTl 




a2 


a3 



VO VI V2 V3 

Model 



Figure 7: The map & from a manifold to its model. For simplicity we are only showing a single derivative in this 
figure 

implementation of it. We thus get the following diagram: 




Figure 8: First step of the PSRP recursion 

Where T>\ is the dynamical system that implements or creates the symbol system Mi. Evidently 
T>\ is a 'bigger' dynamical system than T> v . Where T> v had precisely two zeros, T>\ has a number 
of zeros strictly greater than two. Generically the variables of Mi are elements in a very large 
alphabet, say Em x = Z/(2 32 )Z, so even in a composite coding scheme with a subsymbol alphabet 
of Z/2Z the number of zeros per variable is 64, a factor of 32 greater than the number of zeros in 
T> v , before multiplication by the number of variables. Nor does this exhaust the set of zeros of Mi, 
because the vector fields implementing the endomorphisms also have zeros for function coding, or 
holes in the topology in the case of space coding. 

The map p D takes quadruples of zero-th order function germs in T> 1 (physical instances of vari- 
ables in Mi) into first order function germs in T> v : p D : {(U v ., J- v .)} (j=0, 1,2,3) — > (X?„,IR 4 ) is 
given by p D (V ao , V ai , V CT2 , V a3 ) = (Ui, f) where the representative / = m + u\X + u 2 y + u 3 z is 
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given by the map J£~ _1 (i, o~\, o~2, (T3, 04) = (Ui, v®, U\, Vq, M3) which also specifies the open set Ui. 
Note that p D is bijective but is only defined on the equivalence classes V a .. Furthermore p~j^ is 
an expanding map. It takes any member of a family of first order function germs in V v to four 
topologically distinct components of T>\. In addition we have that homotopy equivalent function 
germs in T> v become topologically separated in T>\. We note also two commutative diagrams: 
p D = J- -1 o @ 1: and Po\Dom(%) °^v = ^i|_Dom(p) ° P- Finally we note that for the case of compos- 
ite coding we have a distinct pair (U v ., J- v ) for each sub-symbol. Therefore in this case the map 
p D : { (U v ., J- v ) }(j = 0, 1,2, • • • ,4Nf— 1) — >• (T> V ,M. 4 ) takes one first order function germ of D v 
into 4Nf topological components of T>\. 




Figure 9: Second step of the PSRP recursion. Each model Mj is a model of the dynamical system £>j-i. Each Mj 
is implemented by a dynamical system T>j which is more complex than the dynamical system "Dj-i- Thus no Mj is 
a model of itself. The solution to the physical self reference problem is thus the profinite completion of this sequence 
of models. 



We notice that in figure 8 p. 9, V ps is only a lowest order model of T> v , and similarly Mi is only 
a lowest order model of T> 1 . Thus we need to solve for the behavior of T>\. Clearly this requires 
introducing a new model M2, but M2 will then have a dynamical system that implements it, and 
this new dynamical system T>2 will be more complex than T>\, and its behavior will be effectively 
unknown, except to lowest order by M 2 . Therefore a new model will be needed, and so on. (see 
figure 9 p. 10.) None of the models Mj in this chain are capable of describing syntactically the 
behavior of the functions on the dynamical system T>i giving rise to Mj (again except to lowest 
order), therefore none of these models is a solution of the PSRP. It seems reasonable to conjecture 
that the forward limit of the models Mj, and the corresponding forward limit of the dynamical 
sytems T>i would be able to solve the problem, however there are significant obstacles to achieving 
this limit. 

We have shown previously that the number of zeros N z j of Vj is strictly greater than the numbers 
of zeros N z j-i of "Dj-i, indeed, N z j = \N z j_i for some AeN where A > 1. Thus N z j > \-'~ 1 N Zj i 
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'well approximated' if we have N p points at which the function value, and the first derivates are 
known (this constant number of terms for an entire function seems to undercount the number really 
needed). Further, let us denote by Ny,i the number of variables needed in model Mj, and by Nfj 
the number of functions in X>j. After a little calculation, one can show that if we stop at any finite 
stage j the number of symbols we need is Nyj+i = 4^ +1 N^ +1 Nj. On the other hand the number of 
symbols we have is Nyj = 4PN^Ni~ , so the difference between what we don't know, and what we 
do is Nyj+i — Nyj = A^N^Ni~ (4N p Nf — 1). We have exponential growth in the error term. Thus 
for any finite approximation there is no convergence. This error estimate seems to be a coarse lower 
bound. The reason is that the requirement of a constant number of terms implicitly assumes that 
the functions all have similar amounts of 'curviness'. However as the number of zeros of the vector 
fields acting on the functions increases the 'curviness' grows, such as for example when the system 
being studied is more complicated (has more zeros), the number of terms needed will increase. So 
as we move further out along the chain of figure 9 (p. 10) each function will require more terms for 
an approximation as accurate as those in the previous systems. 

3.3 The PSRP Limit Space 

It would obviously be ideal to be able to describe the limit space of the PSRP recursion. Unfortu- 
nately at this time that has not been worked out. The monograph Self-Similar Groups[3] provides 
the best current lead in that direction. We have mentioned the intimate connection between self- 
similar groups and automata already, however the work [3] provides a much more general formalism 
for these groups, called the iterated monodromy group of a partial self-covering of topological spaces. 
It seems clear that a 'partial self- covering' is precisely the kind of object that is going to be capable 
of solving the PSRP. The most general formalism for describing such a covering, from a talk [7] 
given by the author of [3] requires a finite covering map p : A4± — > A4, and a continuous map 
l : M\ — > M, these in turn induce maps p # : U(Mi) — >■ T1(M), and t* : Il(Mi) — > U(M). 
The result is that t* is a virtual endomorphism of II (.Mi), a self-similar group, and the iterated 
monodromy of the (topological) correspondence between Mi and A4, furthermore the fundamental 
group II (.Mi) is a sub-group of II (Ai) through the isomorphism p*. 

Unfortunately, the maps we have so far described are neither continuous nor a covering, and in 
fact for the dynamical systems T> t (which is what we are really interested in) we really only have 
the one map p D for each pair. One potential solution is to change the 'interpretational schema' 
of the syntax of the models Mj. (More properly we should say change the model space of the 
syntax, but that label seems likely to add significant confusion overlapping as it does with the use 
of the word model that we have already employed.) If we thought of the symbols of each model as 
amplitudes for a set of basis functions, such as for example polynomials or exponentials, then we 
would naturally have a continuous function defined on the systems T>i, but we would still need to 
find a relationship between those functions across different systems XV There are reasons to believe 
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that a more thourough understanding of the material in [3] may yet yield the needed maps. Finally 
a more direct approach using the profinite completion of the fundamental groupoids of the systems 
T>i also seems quite promising. 
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